Skip to content

Comments

⚡️ Speed up function _byte_to_line_index by 39% in PR #1199 (omni-java)#1611

Open
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
codeflash/optimize-pr1199-2026-02-20T14.22.56
Open

⚡️ Speed up function _byte_to_line_index by 39% in PR #1199 (omni-java)#1611
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
codeflash/optimize-pr1199-2026-02-20T14.22.56

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 20, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 39% (0.39x) speedup for _byte_to_line_index in codeflash/languages/java/instrumentation.py

⏱️ Runtime : 924 microseconds 663 microseconds (best of 163 runs)

📝 Explanation and details

The optimized code achieves a 39% runtime improvement through two key micro-optimizations that reduce per-call overhead in this frequently-executed helper function:

Primary Optimizations

  1. Direct import binding: Changed from bisect.bisect_right() to importing bisect_right directly as _bisect_right. This eliminates the attribute lookup (bisect.) on every function call, saving ~90-100ns per invocation as shown in the line profiler (977304ns → 887516ns for the bisect line).

  2. Conditional expression over max(): Replaced max(0, idx) with idx if idx > 0 else 0. This avoids the overhead of calling the built-in max() function with tuple packing/unpacking, reducing this line's execution time by ~40% (622046ns → 379545ns per the profiler).

Why This Matters

The function maps byte offsets to line indices using binary search, a core operation that happens 2,158 times in the profiled workload. These micro-optimizations compound significantly:

  • Test results show consistent 30-70% speedups across all cases, with the most dramatic improvements (60-70%) occurring in edge cases like empty lists or single elements where the overhead of max() represents a larger proportion of total execution time
  • Large-scale tests (1000-line files with multiple queries) still achieve 27-43% improvements, demonstrating the optimization scales well
  • The optimization is particularly effective for hot-path scenarios like sequential offset queries (42.6% faster) and dense line mapping operations

The changes preserve all behavior including edge case handling (negative indices, empty lists) while delivering substantial performance gains through elimination of unnecessary Python-level function call overhead.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2158 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_basic_mapping_at_and_between_starts():
    # Basic, small, easy-to-reason-about line byte starts
    starts = [0, 10, 20]  # three lines starting at bytes 0, 10, and 20

    # If the offset is exactly at a start, we expect that start's index.
    # bisect_right places the insertion point to the right of equal entries,
    # then function subtracts 1, yielding the index of that equal start.
    codeflash_output = _byte_to_line_index(0, starts) # 1.10μs -> 741ns (48.7% faster)
    codeflash_output = _byte_to_line_index(10, starts) # 611ns -> 431ns (41.8% faster)
    codeflash_output = _byte_to_line_index(20, starts) # 411ns -> 300ns (37.0% faster)

    # Offsets between starts should map to the previous start's index.
    codeflash_output = _byte_to_line_index(5, starts) # 440ns -> 271ns (62.4% faster)
    codeflash_output = _byte_to_line_index(15, starts) # 391ns -> 270ns (44.8% faster)

    # Offsets beyond the last start should map to the last index.
    codeflash_output = _byte_to_line_index(9999, starts) # 340ns -> 220ns (54.5% faster)

def test_edge_empty_list_and_negative_offsets_and_single_element():
    # If there are no starts, bisect_right returns 0 -> idx = -1 -> max(0, -1) => 0
    codeflash_output = _byte_to_line_index(0, []) # 912ns -> 541ns (68.6% faster)
    codeflash_output = _byte_to_line_index(123, []) # 461ns -> 281ns (64.1% faster)
    codeflash_output = _byte_to_line_index(-100, []) # 330ns -> 201ns (64.2% faster)

    # Negative offsets with non-empty starts also should clamp to 0
    starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(-1, starts) # 431ns -> 330ns (30.6% faster)

    # Single-element list: regardless of offset relative to the single start,
    # the algorithm always returns 0 because idx will be 0 or -1, and max keeps 0.
    single = [5]
    for off in (-10, 0, 4, 5, 6, 100):
        codeflash_output = _byte_to_line_index(off, single) # 2.12μs -> 1.34μs (58.3% faster)

def test_edge_duplicates_and_float_support():
    # Duplicate start entries: bisect_right will place insertion point to the right
    # of duplicates, so an offset equal to the duplicated start yields the last duplicate index.
    starts_with_duplicates = [0, 10, 10, 20]
    codeflash_output = _byte_to_line_index(0, starts_with_duplicates) # 1.07μs -> 671ns (59.8% faster)
    # At 10, there are two identical starts at indices 1 and 2.
    # bisect_right places insertion after them -> idx = insertion-1 = 2
    codeflash_output = _byte_to_line_index(10, starts_with_duplicates) # 581ns -> 401ns (44.9% faster)
    # Between 10 and 20 should map to the last 10's index (2)
    codeflash_output = _byte_to_line_index(11, starts_with_duplicates) # 391ns -> 260ns (50.4% faster)

    # Although the function is annotated for ints, bisect works with floats as well.
    # Verify correct behavior with float starts and float offsets.
    float_starts = [0.0, 2.5, 5.5]
    codeflash_output = _byte_to_line_index(0.0, float_starts) # 451ns -> 321ns (40.5% faster)
    codeflash_output = _byte_to_line_index(2.4, float_starts) # 401ns -> 251ns (59.8% faster)
    codeflash_output = _byte_to_line_index(2.5, float_starts) # 411ns -> 290ns (41.7% faster)
    codeflash_output = _byte_to_line_index(5.5, float_starts) # 391ns -> 260ns (50.4% faster)
    codeflash_output = _byte_to_line_index(6.0, float_starts) # 350ns -> 210ns (66.7% faster)

def test_large_scale_mapping_consistency():
    # Large-scale test with 1000 line starts spaced by 10 bytes each.
    n = 1000
    starts = [i * 10 for i in range(n)]  # deterministic, sorted starts [0,10,20,...,9990]

    # 1) Offsets that are exactly at start points should map to that start's index.
    #    We test many such offsets to ensure bisect_right behavior is preserved at scale.
    for i in range(0, n, 50):  # step by 50 to keep assertions informative while covering range
        offset = i * 10
        expected = i
        codeflash_output = _byte_to_line_index(offset, starts) # 10.0μs -> 7.37μs (36.0% faster)

    # 2) Offsets that are in the middle between two starts should map to the lower index.
    #    For each i, offset = i*10 + 5 lies between start[i] and start[i+1], so expect i.
    for i in range(0, n - 1, 50):
        offset = i * 10 + 5
        expected = i
        codeflash_output = _byte_to_line_index(offset, starts) # 9.24μs -> 6.68μs (38.3% faster)

    # 3) Offsets beyond the last start should map to the last index (n-1).
    codeflash_output = _byte_to_line_index(n * 10 + 12345, starts) # 441ns -> 330ns (33.6% faster)

    # 4) A comprehensive sweep (1000 checks) to ensure correctness across many offsets.
    #    This is a heavier check but still deterministic and bounded per the instructions.
    for i in range(n):
        # Pick an offset that is guaranteed to map to index i: choose i*10 (exact start)
        offset_exact = i * 10
        codeflash_output = _byte_to_line_index(offset_exact, starts) # 416μs -> 299μs (39.0% faster)

        # Also pick a midpoint between i and i+1 (except for last index).
        if i < n - 1:
            offset_mid = i * 10 + 7  # inside the range for index i
            codeflash_output = _byte_to_line_index(offset_mid, starts)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import bisect

# imports
import pytest
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_single_line_offset_at_start():
    """Test offset 0 with a single-line file returns index 0."""
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.12μs -> 721ns (55.6% faster)

def test_single_line_offset_in_middle():
    """Test offset within a single line returns index 0."""
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 991ns -> 621ns (59.6% faster)

def test_two_lines_offset_in_first_line():
    """Test offset in first line of multi-line file."""
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 971ns -> 601ns (61.6% faster)

def test_two_lines_offset_in_second_line():
    """Test offset in second line of multi-line file."""
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(15, line_byte_starts); result = codeflash_output # 912ns -> 651ns (40.1% faster)

def test_two_lines_offset_at_second_line_start():
    """Test offset exactly at the start of second line."""
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 942ns -> 651ns (44.7% faster)

def test_three_lines_offset_in_middle_line():
    """Test offset in middle line of three-line file."""
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(15, line_byte_starts); result = codeflash_output # 942ns -> 651ns (44.7% faster)

def test_three_lines_offset_in_last_line():
    """Test offset in last line of three-line file."""
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(25, line_byte_starts); result = codeflash_output # 902ns -> 621ns (45.2% faster)

def test_large_offset_values():
    """Test with large byte offset values."""
    line_byte_starts = [0, 1000, 2000, 3000]
    codeflash_output = _byte_to_line_index(2500, line_byte_starts); result = codeflash_output # 922ns -> 601ns (53.4% faster)

def test_many_lines():
    """Test with many lines (10 lines)."""
    line_byte_starts = [i * 100 for i in range(10)]
    codeflash_output = _byte_to_line_index(550, line_byte_starts); result = codeflash_output # 962ns -> 661ns (45.5% faster)

def test_empty_line_byte_starts():
    """Test with empty line_byte_starts list."""
    line_byte_starts = []
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 862ns -> 551ns (56.4% faster)

def test_empty_list_with_nonzero_offset():
    """Test with empty list and non-zero offset."""
    line_byte_starts = []
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 871ns -> 510ns (70.8% faster)

def test_offset_zero_with_multiple_lines():
    """Test offset 0 with multiple lines always returns 0."""
    line_byte_starts = [0, 5, 10, 15]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.02μs -> 721ns (41.7% faster)

def test_offset_before_all_lines():
    """Test with offset before the first line start."""
    line_byte_starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 982ns -> 651ns (50.8% faster)

def test_offset_exactly_at_first_line_start():
    """Test offset exactly at first line start (non-zero)."""
    line_byte_starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 1.00μs -> 662ns (51.4% faster)

def test_offset_beyond_all_lines():
    """Test with offset far beyond the last line."""
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(1000, line_byte_starts); result = codeflash_output # 902ns -> 661ns (36.5% faster)

def test_single_large_offset_value():
    """Test with a very large single offset value."""
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(1000000, line_byte_starts); result = codeflash_output # 962ns -> 582ns (65.3% faster)

def test_lines_with_zero_starts():
    """Test with line_byte_starts containing only zeros."""
    line_byte_starts = [0, 0, 0]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 972ns -> 681ns (42.7% faster)

def test_lines_with_consecutive_starts():
    """Test with consecutive line starts (single-character lines)."""
    line_byte_starts = [0, 1, 2, 3, 4, 5]
    codeflash_output = _byte_to_line_index(3, line_byte_starts); result = codeflash_output # 1.02μs -> 642ns (59.2% faster)

def test_negative_like_behavior():
    """Test behavior when bisect returns -1 (caught by max)."""
    line_byte_starts = [100]
    codeflash_output = _byte_to_line_index(50, line_byte_starts); result = codeflash_output # 902ns -> 591ns (52.6% faster)

def test_very_large_line_byte_starts():
    """Test with very large byte start values."""
    line_byte_starts = [0, 1000000, 2000000, 3000000]
    codeflash_output = _byte_to_line_index(2500000, line_byte_starts); result = codeflash_output # 922ns -> 601ns (53.4% faster)

def test_offset_at_exact_line_boundary():
    """Test multiple offsets at exact line boundaries."""
    line_byte_starts = [0, 100, 200, 300]
    # Offset at second line start should map to first line
    codeflash_output = _byte_to_line_index(100, line_byte_starts) # 972ns -> 671ns (44.9% faster)
    # Offset at third line start should map to second line
    codeflash_output = _byte_to_line_index(200, line_byte_starts) # 561ns -> 391ns (43.5% faster)
    # Offset at fourth line start should map to third line
    codeflash_output = _byte_to_line_index(300, line_byte_starts) # 420ns -> 300ns (40.0% faster)

def test_large_file_1000_lines():
    """Test with 1000 lines (large file scenario)."""
    # Create line starts for a file with 1000 lines, each 50 bytes
    line_byte_starts = [i * 50 for i in range(1000)]
    # Test offset in the middle of the file
    codeflash_output = _byte_to_line_index(25000, line_byte_starts); result = codeflash_output # 1.25μs -> 912ns (37.3% faster)

def test_large_file_offset_near_start():
    """Test large file with offset near the start."""
    line_byte_starts = [i * 50 for i in range(1000)]
    codeflash_output = _byte_to_line_index(100, line_byte_starts); result = codeflash_output # 1.02μs -> 762ns (34.0% faster)

def test_large_file_offset_near_end():
    """Test large file with offset near the end."""
    line_byte_starts = [i * 50 for i in range(1000)]
    codeflash_output = _byte_to_line_index(49900, line_byte_starts); result = codeflash_output # 1.11μs -> 872ns (27.5% faster)

def test_large_file_offset_at_very_end():
    """Test large file with very large offset (beyond file)."""
    line_byte_starts = [i * 50 for i in range(1000)]
    codeflash_output = _byte_to_line_index(100000, line_byte_starts); result = codeflash_output # 1.12μs -> 852ns (31.7% faster)

def test_many_queries_consistency():
    """Test multiple queries on same file for consistency."""
    line_byte_starts = [i * 100 for i in range(500)]
    offsets_to_test = [0, 50, 100, 500, 1000, 15000, 49900]
    expected_results = [0, 0, 0, 5, 10, 150, 499]
    
    for offset, expected in zip(offsets_to_test, expected_results):
        codeflash_output = _byte_to_line_index(offset, line_byte_starts); result = codeflash_output # 4.23μs -> 3.10μs (36.2% faster)

def test_sequential_offsets_monotonic():
    """Test that sequential offsets produce monotonically non-decreasing results."""
    line_byte_starts = [i * 50 for i in range(100)]
    previous_result = -1
    
    # Test offsets at regular intervals
    for offset in range(0, 5000, 100):
        codeflash_output = _byte_to_line_index(offset, line_byte_starts); result = codeflash_output # 20.6μs -> 14.4μs (42.6% faster)
        previous_result = result

def test_binary_search_efficiency():
    """Test that function handles large lists efficiently (via bisect)."""
    # Create a very large line list (1000 lines)
    line_byte_starts = [i * 100 for i in range(1000)]
    
    # Test multiple offsets to ensure bisect works correctly at scale
    test_cases = [
        (0, 0),
        (50, 0),
        (100, 0),
        (5000, 50),
        (50000, 500),
        (99900, 999),
    ]
    
    for offset, expected_line in test_cases:
        codeflash_output = _byte_to_line_index(offset, line_byte_starts); result = codeflash_output # 3.91μs -> 2.75μs (42.2% faster)

def test_dense_line_starts():
    """Test with very dense line starts (every byte)."""
    line_byte_starts = list(range(1000))
    codeflash_output = _byte_to_line_index(500, line_byte_starts); result = codeflash_output # 1.08μs -> 861ns (25.7% faster)

def test_sparse_line_starts():
    """Test with very sparse line starts (large gaps)."""
    line_byte_starts = [0, 10000, 20000, 30000, 40000, 50000]
    codeflash_output = _byte_to_line_index(25000, line_byte_starts); result = codeflash_output # 1.03μs -> 711ns (45.1% faster)

def test_mixed_sized_lines_large_scale():
    """Test with mixed line sizes at large scale."""
    # Create line starts with varying gaps
    line_byte_starts = [0]
    current = 0
    for i in range(500):
        # Alternate between 50 and 100 byte lines
        gap = 50 if i % 2 == 0 else 100
        current += gap
        line_byte_starts.append(current)
    
    codeflash_output = _byte_to_line_index(current // 2, line_byte_starts); result = codeflash_output # 1.05μs -> 701ns (50.1% faster)

def test_performance_large_offset_large_list():
    """Test performance with large offset and large line list."""
    # Create 1000-line file with 1000-byte lines
    line_byte_starts = [i * 1000 for i in range(1000)]
    
    # Test offset far into the file
    codeflash_output = _byte_to_line_index(900000, line_byte_starts); result = codeflash_output # 1.19μs -> 912ns (30.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-20T14.22.56 and push.

Codeflash Static Badge

The optimized code achieves a **39% runtime improvement** through two key micro-optimizations that reduce per-call overhead in this frequently-executed helper function:

## Primary Optimizations

1. **Direct import binding**: Changed from `bisect.bisect_right()` to importing `bisect_right` directly as `_bisect_right`. This eliminates the attribute lookup (`bisect.`) on every function call, saving ~90-100ns per invocation as shown in the line profiler (977304ns → 887516ns for the bisect line).

2. **Conditional expression over max()**: Replaced `max(0, idx)` with `idx if idx > 0 else 0`. This avoids the overhead of calling the built-in `max()` function with tuple packing/unpacking, reducing this line's execution time by ~40% (622046ns → 379545ns per the profiler).

## Why This Matters

The function maps byte offsets to line indices using binary search, a core operation that happens **2,158 times** in the profiled workload. These micro-optimizations compound significantly:

- **Test results show consistent 30-70% speedups** across all cases, with the most dramatic improvements (60-70%) occurring in edge cases like empty lists or single elements where the overhead of `max()` represents a larger proportion of total execution time
- **Large-scale tests** (1000-line files with multiple queries) still achieve 27-43% improvements, demonstrating the optimization scales well
- The optimization is particularly effective for **hot-path scenarios** like sequential offset queries (42.6% faster) and dense line mapping operations

The changes preserve all behavior including edge case handling (negative indices, empty lists) while delivering substantial performance gains through elimination of unnecessary Python-level function call overhead.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 20, 2026
@claude
Copy link
Contributor

claude bot commented Feb 20, 2026

PR Review Summary

Prek Checks

Fixed 3 issues (auto-fixed by ruff):

  • I001 unsorted-imports: from bisect import bisect_right moved to correct position
  • F401 unused-import: removed now-unused import bisect
  • FURB136 if-expr-min-max: reverted idx if idx > 0 else 0 back to max(0, idx) per linting rules

All prek checks now pass.

Mypy

19 pre-existing errors in instrumentation.py (missing type annotations, untyped functions). None introduced by this PR.

Code Review

No critical issues found. The PR makes a single micro-optimization:

  • Replaces bisect.bisect_right() with a direct import _bisect_right, eliminating attribute lookup overhead on each call.
  • The max(0, idx) optimization (idx if idx > 0 else 0) was reverted by the FURB136 linting rule, so the remaining speedup comes only from the direct import binding.
  • Logic and behavior are unchanged.

Test Coverage

File PR Coverage Base Coverage Change
codeflash/languages/java/instrumentation.py 83% 83% 0%
  • Changed lines (import + _bisect_right call) are covered by existing tests
  • No coverage regression

Last updated: 2026-02-20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants